Dataset statistics
| Number of variables | 15 |
|---|---|
| Number of observations | 49829 |
| Missing cells | 365211 |
| Missing cells (%) | 48.9% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 5.7 MiB |
| Average record size in memory | 120.0 B |
Variable types
| Text | 3 |
|---|---|
| DateTime | 4 |
| Numeric | 5 |
| Categorical | 3 |
ACCOUNT_AGE_MONTHS is highly overall correlated with STATE | High correlation |
AGE is highly overall correlated with STATE | High correlation |
BARCODE is highly overall correlated with GENDER and 2 other fields | High correlation |
GENDER is highly overall correlated with BARCODE and 1 other fields | High correlation |
LANGUAGE is highly overall correlated with BARCODE and 1 other fields | High correlation |
STATE is highly overall correlated with ACCOUNT_AGE_MONTHS and 4 other fields | High correlation |
LANGUAGE is highly imbalanced (72.9%) | Imbalance |
BARCODE has 5735 (11.5%) missing values | Missing |
FINAL_SALE has 12486 (25.1%) missing values | Missing |
CREATED_DATE has 49570 (99.5%) missing values | Missing |
BIRTH_DATE has 49570 (99.5%) missing values | Missing |
STATE has 49570 (99.5%) missing values | Missing |
LANGUAGE has 49570 (99.5%) missing values | Missing |
GENDER has 49570 (99.5%) missing values | Missing |
AGE has 49570 (99.5%) missing values | Missing |
ACCOUNT_AGE_MONTHS has 49570 (99.5%) missing values | Missing |
FINAL_SALE is highly skewed (γ1 = 25.10610229) | Skewed |
FINAL_QUANTITY has 12491 (25.1%) zeros | Zeros |
Reproduction
| Analysis started | 2025-03-11 18:29:11.080762 |
|---|---|
| Analysis finished | 2025-03-11 18:30:42.081820 |
| Duration | 1 minute and 31 seconds |
| Software version | ydata-profiling v0.0.dev0 |
| Download configuration | config.json |
RECEIPT_ID
Text
| Distinct | 24440 |
|---|---|
| Distinct (%) | 49.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 389.4 KiB |
Length
| Max length | 36 |
|---|---|
| Median length | 36 |
| Mean length | 36 |
| Min length | 36 |
Characters and Unicode
| Total characters | 1793844 |
|---|---|
| Distinct characters | 17 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0000d256-4041-4a3e-adc4-5623fb6e0c99 |
|---|---|
| 2nd row | 0001455d-7a92-4a7b-a1d2-c747af1c8fd3 |
| 3rd row | 00017e0a-7851-42fb-bfab-0baa96e23586 |
| 4th row | 000239aa-3478-453d-801e-66a82e39c8af |
| 5th row | 00026b4c-dfe8-49dd-b026-4c2f0fd5c6a1 |
| Value | Count | Frequency (%) |
| 0fb89572-c817-47e2-bd11-6f467baacbb2 | 6 | < 0.1% |
| 79151f8d-0b75-48e2-8bb4-2591bc8c9ca2 | 6 | < 0.1% |
| 98d68d5d-71f1-4528-a83d-cdf6d308c79b | 6 | < 0.1% |
| dd03ea1b-0fae-4bcf-bb55-d7e36eaa75b5 | 6 | < 0.1% |
| a634ba37-2988-46ff-8c61-a4cc4acd4403 | 6 | < 0.1% |
| 4495fbcf-ad2c-4e4f-a77b-ff2ba6984f54 | 6 | < 0.1% |
| d6a313ee-1aa3-4acb-a90d-f0d962ae7b8c | 6 | < 0.1% |
| 171a74cd-7038-43fa-a3ae-de6b6cca5d36 | 6 | < 0.1% |
| 682cb059-74a1-4c47-abd8-5fd6541d88bf | 6 | < 0.1% |
| 6e5ec1d0-e63f-4707-bd6e-78672ecd2a6c | 6 | < 0.1% |
| Other values (24430) | 49769 |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 199316 | 11.1% |
| 4 | 143154 | 8.0% |
| a | 106536 | 5.9% |
| 8 | 106015 | 5.9% |
| 9 | 105788 | 5.9% |
| b | 105214 | 5.9% |
| e | 93920 | 5.2% |
| 7 | 93717 | 5.2% |
| 2 | 93593 | 5.2% |
| c | 93455 | 5.2% |
| Other values (7) | 653136 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1793844 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| - | 199316 | 11.1% |
| 4 | 143154 | 8.0% |
| a | 106536 | 5.9% |
| 8 | 106015 | 5.9% |
| 9 | 105788 | 5.9% |
| b | 105214 | 5.9% |
| e | 93920 | 5.2% |
| 7 | 93717 | 5.2% |
| 2 | 93593 | 5.2% |
| c | 93455 | 5.2% |
| Other values (7) | 653136 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1793844 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| - | 199316 | 11.1% |
| 4 | 143154 | 8.0% |
| a | 106536 | 5.9% |
| 8 | 106015 | 5.9% |
| 9 | 105788 | 5.9% |
| b | 105214 | 5.9% |
| e | 93920 | 5.2% |
| 7 | 93717 | 5.2% |
| 2 | 93593 | 5.2% |
| c | 93455 | 5.2% |
| Other values (7) | 653136 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1793844 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| - | 199316 | 11.1% |
| 4 | 143154 | 8.0% |
| a | 106536 | 5.9% |
| 8 | 106015 | 5.9% |
| 9 | 105788 | 5.9% |
| b | 105214 | 5.9% |
| e | 93920 | 5.2% |
| 7 | 93717 | 5.2% |
| 2 | 93593 | 5.2% |
| c | 93455 | 5.2% |
| Other values (7) | 653136 |
PURCHASE_DATE
Date
| Distinct | 89 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 389.4 KiB |
| Minimum | 2024-06-12 00:00:00 |
|---|---|
| Maximum | 2024-09-08 00:00:00 |
SCAN_DATE
Date
| Distinct | 24440 |
|---|---|
| Distinct (%) | 49.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 389.4 KiB |
| Minimum | 2024-06-12 06:36:34.910000+00:00 |
|---|---|
| Maximum | 2024-09-08 23:07:19.836000+00:00 |
STORE_NAME
Text
| Distinct | 954 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 389.4 KiB |
Length
| Max length | 66 |
|---|---|
| Median length | 42 |
| Mean length | 8.7855867 |
| Min length | 1 |
Characters and Unicode
| Total characters | 437777 |
|---|---|
| Distinct characters | 45 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | WALMART |
|---|---|
| 2nd row | ALDI |
| 3rd row | WALMART |
| 4th row | FOOD LION |
| 5th row | RANDALLS |
| Value | Count | Frequency (%) |
| walmart | 21249 | |
| dollar | 4490 | 6.3% |
| store | 2970 | 4.2% |
| general | 2772 | 3.9% |
| aldi | 2632 | 3.7% |
| target | 1484 | 2.1% |
| kroger | 1477 | 2.1% |
| food | 1392 | 2.0% |
| club | 1344 | 1.9% |
| stores | 1290 | 1.8% |
| Other values (1262) | 29894 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 65946 | |
| R | 48846 | |
| L | 44337 | |
| T | 37344 | |
| E | 32562 | 7.4% |
| M | 27929 | 6.4% |
| W | 24863 | 5.7% |
| O | 23498 | 5.4% |
| 21191 | 4.8% | |
| S | 21125 | 4.8% |
| Other values (35) | 90136 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 437777 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| A | 65946 | |
| R | 48846 | |
| L | 44337 | |
| T | 37344 | |
| E | 32562 | 7.4% |
| M | 27929 | 6.4% |
| W | 24863 | 5.7% |
| O | 23498 | 5.4% |
| 21191 | 4.8% | |
| S | 21125 | 4.8% |
| Other values (35) | 90136 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 437777 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| A | 65946 | |
| R | 48846 | |
| L | 44337 | |
| T | 37344 | |
| E | 32562 | 7.4% |
| M | 27929 | 6.4% |
| W | 24863 | 5.7% |
| O | 23498 | 5.4% |
| 21191 | 4.8% | |
| S | 21125 | 4.8% |
| Other values (35) | 90136 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 437777 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| A | 65946 | |
| R | 48846 | |
| L | 44337 | |
| T | 37344 | |
| E | 32562 | 7.4% |
| M | 27929 | 6.4% |
| W | 24863 | 5.7% |
| O | 23498 | 5.4% |
| 21191 | 4.8% | |
| S | 21125 | 4.8% |
| Other values (35) | 90136 |
USER_ID
Text
| Distinct | 17694 |
|---|---|
| Distinct (%) | 35.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 389.4 KiB |
Length
| Max length | 24 |
|---|---|
| Median length | 24 |
| Mean length | 24 |
| Min length | 24 |
Characters and Unicode
| Total characters | 1195896 |
|---|---|
| Distinct characters | 16 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 63b73a7f3d310dceeabd4758 |
|---|---|
| 2nd row | 62c08877baa38d1a1f6c211a |
| 3rd row | 60842f207ac8b7729e472020 |
| 4th row | 63fcd7cea4f8442c3386b589 |
| 5th row | 6193231ae9b3d75037b0f928 |
| Value | Count | Frequency (%) |
| 64e62de5ca929250373e6cf5 | 22 | < 0.1% |
| 604278958fe03212b47e657b | 20 | < 0.1% |
| 62925c1be942f00613f7365e | 20 | < 0.1% |
| 64063c8880552327897186a5 | 18 | < 0.1% |
| 624dca0770c07012cd5e6c03 | 14 | < 0.1% |
| 6327a07aca87b39d76e03864 | 14 | < 0.1% |
| 609af341659cf474018831fb | 14 | < 0.1% |
| 61d5f5d2c4525a3a478b386b | 13 | < 0.1% |
| 60a5363facc00d347abadc8e | 13 | < 0.1% |
| 65d4915916cc391732127174 | 12 | < 0.1% |
| Other values (17684) | 49669 |
Most occurring characters
| Value | Count | Frequency (%) |
| 6 | 107451 | 9.0% |
| 5 | 85632 | 7.2% |
| 1 | 80743 | 6.8% |
| 3 | 79202 | 6.6% |
| 4 | 76849 | 6.4% |
| 0 | 75962 | 6.4% |
| 2 | 74470 | 6.2% |
| 9 | 71559 | 6.0% |
| d | 70096 | 5.9% |
| c | 69530 | 5.8% |
| Other values (6) | 404402 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1195896 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 6 | 107451 | 9.0% |
| 5 | 85632 | 7.2% |
| 1 | 80743 | 6.8% |
| 3 | 79202 | 6.6% |
| 4 | 76849 | 6.4% |
| 0 | 75962 | 6.4% |
| 2 | 74470 | 6.2% |
| 9 | 71559 | 6.0% |
| d | 70096 | 5.9% |
| c | 69530 | 5.8% |
| Other values (6) | 404402 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1195896 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 6 | 107451 | 9.0% |
| 5 | 85632 | 7.2% |
| 1 | 80743 | 6.8% |
| 3 | 79202 | 6.6% |
| 4 | 76849 | 6.4% |
| 0 | 75962 | 6.4% |
| 2 | 74470 | 6.2% |
| 9 | 71559 | 6.0% |
| d | 70096 | 5.9% |
| c | 69530 | 5.8% |
| Other values (6) | 404402 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1195896 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 6 | 107451 | 9.0% |
| 5 | 85632 | 7.2% |
| 1 | 80743 | 6.8% |
| 3 | 79202 | 6.6% |
| 4 | 76849 | 6.4% |
| 0 | 75962 | 6.4% |
| 2 | 74470 | 6.2% |
| 9 | 71559 | 6.0% |
| d | 70096 | 5.9% |
| c | 69530 | 5.8% |
| Other values (6) | 404402 |
BARCODE
Real number (ℝ)
HIGH CORRELATION  MISSING 
| Distinct | 11027 |
|---|---|
| Distinct (%) | 25.0% |
| Missing | 5735 |
| Missing (%) | 11.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.7157722 × 1011 |
| Minimum | -1 |
|---|---|
| Maximum | 9.347108 × 1012 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 8 |
| Negative (%) | < 0.1% |
| Memory size | 389.4 KiB |
Quantile statistics
| Minimum | -1 |
|---|---|
| 5-th percentile | 1.2000214 × 1010 |
| Q1 | 3.0772126 × 1010 |
| median | 5.2100038 × 1010 |
| Q3 | 8.5239935 × 1010 |
| 95-th percentile | 7.873591 × 1011 |
| Maximum | 9.347108 × 1012 |
| Range | 9.347108 × 1012 |
| Interquartile range (IQR) | 5.4467809 × 1010 |
Descriptive statistics
| Standard deviation | 3.2715345 × 1011 |
|---|---|
| Coefficient of variation (CV) | 1.9067417 |
| Kurtosis | 204.80754 |
| Mean | 1.7157722 × 1011 |
| Median Absolute Deviation (MAD) | 2.6642143 × 1010 |
| Skewness | 10.047094 |
| Sum | 7.5655261 × 1015 |
| Variance | 1.0702938 × 1023 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 7.874222376 × 1010 | 181 | 0.4% |
| 5.11111504 × 1011 | 168 | 0.3% |
| 5.111110018 × 1011 | 163 | 0.3% |
| 7.874228544 × 1010 | 158 | 0.3% |
| 3.111112241 × 1011 | 149 | 0.3% |
| 4.900000044 × 1010 | 142 | 0.3% |
| 7.874201228 × 1010 | 142 | 0.3% |
| 5.11111704 × 1011 | 136 | 0.3% |
| 7.874209728 × 1010 | 110 | 0.2% |
| 7.87420364 × 1010 | 86 | 0.2% |
| Other values (11017) | 42659 | |
| (Missing) | 5735 | 11.5% |
| Value | Count | Frequency (%) |
| -1 | 8 | < 0.1% |
| 2226 | 4 | < 0.1% |
| 31059 | 12 | < 0.1% |
| 31073 | 24 | < 0.1% |
| 33749 | 8 | < 0.1% |
| 40136 | 6 | < 0.1% |
| 40945 | 42 | |
| 42185 | 4 | < 0.1% |
| 45605 | 61 | |
| 45643 | 4 | < 0.1% |
| Value | Count | Frequency (%) |
| 9.347108002 × 1012 | 2 | |
| 8.901696552 × 1012 | 4 | |
| 8.711700967 × 1012 | 2 | |
| 8.69084021 × 1012 | 2 | |
| 7.702011027 × 1012 | 2 | |
| 7.702011003 × 1012 | 2 | |
| 7.501103306 × 1012 | 4 | |
| 6.970707749 × 1012 | 4 | |
| 5.060305162 × 1012 | 2 | |
| 5.060242153 × 1012 | 2 |
FINAL_QUANTITY
Real number (ℝ)
ZEROS 
| Distinct | 87 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.80300347 |
| Minimum | 0 |
|---|---|
| Maximum | 18 |
| Zeros | 12491 |
| Zeros (%) | 25.1% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 389.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.6034035 |
|---|---|
| Coefficient of variation (CV) | 0.75143324 |
| Kurtosis | 76.865783 |
| Mean | 0.80300347 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.1670285 |
| Sum | 40012.86 |
| Variance | 0.36409579 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1 | 35536 | |
| 0 | 12491 | 25.1% |
| 2 | 1285 | 2.6% |
| 3 | 184 | 0.4% |
| 4 | 139 | 0.3% |
| 6 | 26 | 0.1% |
| 5 | 22 | < 0.1% |
| 8 | 8 | < 0.1% |
| 12 | 7 | < 0.1% |
| 7 | 7 | < 0.1% |
| Other values (77) | 124 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 12491 | |
| 0.01 | 1 | < 0.1% |
| 0.04 | 1 | < 0.1% |
| 0.09 | 2 | < 0.1% |
| 0.23 | 4 | < 0.1% |
| 0.24 | 1 | < 0.1% |
| 0.28 | 1 | < 0.1% |
| 0.35 | 1 | < 0.1% |
| 0.46 | 3 | < 0.1% |
| 0.48 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 18 | 2 | < 0.1% |
| 16 | 2 | < 0.1% |
| 12 | 7 | < 0.1% |
| 10 | 5 | < 0.1% |
| 9 | 3 | < 0.1% |
| 8 | 8 | < 0.1% |
| 7 | 7 | < 0.1% |
| 6.22 | 1 | < 0.1% |
| 6 | 26 | |
| 5.53 | 1 | < 0.1% |
FINAL_SALE
Real number (ℝ)
MISSING  SKEWED 
| Distinct | 1434 |
|---|---|
| Distinct (%) | 3.8% |
| Missing | 12486 |
| Missing (%) | 25.1% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.5840664 |
| Minimum | 0 |
|---|---|
| Maximum | 462.82 |
| Zeros | 473 |
| Zeros (%) | 0.9% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 389.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0.89 |
| Q1 | 1.82 |
| median | 3 |
| Q3 | 5.19 |
| 95-th percentile | 12.99 |
| Maximum | 462.82 |
| Range | 462.82 |
| Interquartile range (IQR) | 3.37 |
Descriptive statistics
| Standard deviation | 6.6324545 |
|---|---|
| Coefficient of variation (CV) | 1.4468496 |
| Kurtosis | 1419.5604 |
| Mean | 4.5840664 |
| Median Absolute Deviation (MAD) | 1.54 |
| Skewness | 25.106102 |
| Sum | 171182.79 |
| Variance | 43.989453 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 1.25 | 1313 | 2.6% |
| 1 | 732 | 1.5% |
| 2.99 | 587 | 1.2% |
| 1.99 | 581 | 1.2% |
| 3.99 | 567 | 1.1% |
| 2 | 534 | 1.1% |
| 3.98 | 506 | 1.0% |
| 4.99 | 484 | 1.0% |
| 0 | 473 | 0.9% |
| 1.98 | 450 | 0.9% |
| Other values (1424) | 31116 | |
| (Missing) | 12486 |
| Value | Count | Frequency (%) |
| 0 | 473 | |
| 0.01 | 3 | < 0.1% |
| 0.03 | 2 | < 0.1% |
| 0.04 | 2 | < 0.1% |
| 0.05 | 6 | < 0.1% |
| 0.07 | 1 | < 0.1% |
| 0.09 | 2 | < 0.1% |
| 0.1 | 4 | < 0.1% |
| 0.12 | 1 | < 0.1% |
| 0.13 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 462.82 | 2 | |
| 267.29 | 1 | |
| 238.17 | 2 | |
| 224.99 | 1 | |
| 139.31 | 1 | |
| 101.7 | 1 | |
| 100 | 1 | |
| 93.67 | 1 | |
| 90 | 2 | |
| 81.81 | 1 |
CREATED_DATE
Date
MISSING 
| Distinct | 90 |
|---|---|
| Distinct (%) | 34.7% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Memory size | 389.4 KiB |
| Minimum | 2017-07-21 19:42:14+00:00 |
|---|---|
| Maximum | 2024-07-01 13:42:31+00:00 |
BIRTH_DATE
Date
MISSING 
| Distinct | 90 |
|---|---|
| Distinct (%) | 34.7% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Memory size | 389.4 KiB |
| Minimum | 1943-09-03 05:00:00+00:00 |
|---|---|
| Maximum | 1997-02-25 00:00:00+00:00 |
STATE
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 32 |
|---|---|
| Distinct (%) | 12.4% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Memory size | 389.4 KiB |
| FL | |
|---|---|
| IL | |
| PA | |
| NY | |
| NC | 14 |
| Other values (27) |
Length
| Max length | 2 |
|---|---|
| Median length | 2 |
| Mean length | 2 |
| Min length | 2 |
Characters and Unicode
| Total characters | 518 |
|---|---|
| Distinct characters | 21 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | FL |
|---|---|
| 2nd row | NY |
| 3rd row | WI |
| 4th row | WI |
| 5th row | FL |
Common Values
| Value | Count | Frequency (%) |
| FL | 34 | 0.1% |
| IL | 18 | < 0.1% |
| PA | 18 | < 0.1% |
| NY | 18 | < 0.1% |
| NC | 14 | < 0.1% |
| WI | 14 | < 0.1% |
| GA | 12 | < 0.1% |
| CA | 12 | < 0.1% |
| VA | 10 | < 0.1% |
| OK | 10 | < 0.1% |
| Other values (22) | 99 | 0.2% |
| (Missing) | 49570 |
Length
| Value | Count | Frequency (%) |
| fl | 34 | 13.1% |
| il | 18 | 6.9% |
| pa | 18 | 6.9% |
| ny | 18 | 6.9% |
| nc | 14 | 5.4% |
| wi | 14 | 5.4% |
| ga | 12 | 4.6% |
| ca | 12 | 4.6% |
| va | 10 | 3.9% |
| ok | 10 | 3.9% |
| Other values (22) | 99 |
Most occurring characters
| Value | Count | Frequency (%) |
| A | 68 | |
| L | 60 | |
| N | 54 | |
| C | 45 | 8.7% |
| I | 42 | 8.1% |
| F | 34 | 6.6% |
| O | 26 | 5.0% |
| Y | 26 | 5.0% |
| W | 24 | 4.6% |
| T | 21 | 4.1% |
| Other values (11) | 118 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 518 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| A | 68 | |
| L | 60 | |
| N | 54 | |
| C | 45 | 8.7% |
| I | 42 | 8.1% |
| F | 34 | 6.6% |
| O | 26 | 5.0% |
| Y | 26 | 5.0% |
| W | 24 | 4.6% |
| T | 21 | 4.1% |
| Other values (11) | 118 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 518 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| A | 68 | |
| L | 60 | |
| N | 54 | |
| C | 45 | 8.7% |
| I | 42 | 8.1% |
| F | 34 | 6.6% |
| O | 26 | 5.0% |
| Y | 26 | 5.0% |
| W | 24 | 4.6% |
| T | 21 | 4.1% |
| Other values (11) | 118 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 518 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| A | 68 | |
| L | 60 | |
| N | 54 | |
| C | 45 | 8.7% |
| I | 42 | 8.1% |
| F | 34 | 6.6% |
| O | 26 | 5.0% |
| Y | 26 | 5.0% |
| W | 24 | 4.6% |
| T | 21 | 4.1% |
| Other values (11) | 118 |
LANGUAGE
Categorical
HIGH CORRELATION  IMBALANCE  MISSING 
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Memory size | 389.4 KiB |
| en | |
|---|---|
| es-419 | 12 |
Length
| Max length | 6 |
|---|---|
| Median length | 2 |
| Mean length | 2.1853282 |
| Min length | 2 |
Characters and Unicode
| Total characters | 566 |
|---|---|
| Distinct characters | 7 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | en |
|---|---|
| 2nd row | en |
| 3rd row | en |
| 4th row | en |
| 5th row | en |
Common Values
| Value | Count | Frequency (%) |
| en | 247 | 0.5% |
| es-419 | 12 | < 0.1% |
| (Missing) | 49570 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| en | 247 | |
| es-419 | 12 | 4.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 259 | |
| n | 247 | |
| s | 12 | 2.1% |
| - | 12 | 2.1% |
| 4 | 12 | 2.1% |
| 1 | 12 | 2.1% |
| 9 | 12 | 2.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 566 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 259 | |
| n | 247 | |
| s | 12 | 2.1% |
| - | 12 | 2.1% |
| 4 | 12 | 2.1% |
| 1 | 12 | 2.1% |
| 9 | 12 | 2.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 566 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 259 | |
| n | 247 | |
| s | 12 | 2.1% |
| - | 12 | 2.1% |
| 4 | 12 | 2.1% |
| 1 | 12 | 2.1% |
| 9 | 12 | 2.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 566 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 259 | |
| n | 247 | |
| s | 12 | 2.1% |
| - | 12 | 2.1% |
| 4 | 12 | 2.1% |
| 1 | 12 | 2.1% |
| 9 | 12 | 2.1% |
GENDER
Categorical
HIGH CORRELATION  MISSING 
| Distinct | 2 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Memory size | 389.4 KiB |
| female | |
|---|---|
| male |
Length
| Max length | 6 |
|---|---|
| Median length | 6 |
| Mean length | 5.6602317 |
| Min length | 4 |
Characters and Unicode
| Total characters | 1466 |
|---|---|
| Distinct characters | 5 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | female |
|---|---|
| 2nd row | male |
| 3rd row | female |
| 4th row | female |
| 5th row | male |
Common Values
| Value | Count | Frequency (%) |
| female | 215 | 0.4% |
| male | 44 | 0.1% |
| (Missing) | 49570 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| female | 215 | |
| male | 44 | 17.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 474 | |
| m | 259 | |
| a | 259 | |
| l | 259 | |
| f | 215 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1466 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 474 | |
| m | 259 | |
| a | 259 | |
| l | 259 | |
| f | 215 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1466 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 474 | |
| m | 259 | |
| a | 259 | |
| l | 259 | |
| f | 215 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1466 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 474 | |
| m | 259 | |
| a | 259 | |
| l | 259 | |
| f | 215 |
AGE
Real number (ℝ)
HIGH CORRELATION  MISSING 
| Distinct | 41 |
|---|---|
| Distinct (%) | 15.8% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 52.108108 |
| Minimum | 28 |
|---|---|
| Maximum | 81 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 389.4 KiB |
Quantile statistics
| Minimum | 28 |
|---|---|
| 5-th percentile | 32 |
| Q1 | 40 |
| median | 51 |
| Q3 | 62.5 |
| 95-th percentile | 76 |
| Maximum | 81 |
| Range | 53 |
| Interquartile range (IQR) | 22.5 |
Descriptive statistics
| Standard deviation | 14.118405 |
|---|---|
| Coefficient of variation (CV) | 0.27094449 |
| Kurtosis | -1.0257722 |
| Mean | 52.108108 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | 0.19745528 |
| Sum | 13496 |
| Variance | 199.32935 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 70 | 13 | < 0.1% |
| 60 | 12 | < 0.1% |
| 37 | 12 | < 0.1% |
| 35 | 12 | < 0.1% |
| 49 | 12 | < 0.1% |
| 44 | 10 | < 0.1% |
| 36 | 10 | < 0.1% |
| 46 | 10 | < 0.1% |
| 40 | 10 | < 0.1% |
| 43 | 10 | < 0.1% |
| Other values (31) | 148 | 0.3% |
| (Missing) | 49570 |
| Value | Count | Frequency (%) |
| 28 | 8 | |
| 31 | 4 | < 0.1% |
| 32 | 2 | < 0.1% |
| 33 | 4 | < 0.1% |
| 34 | 6 | |
| 35 | 12 | |
| 36 | 10 | |
| 37 | 12 | |
| 38 | 2 | < 0.1% |
| 40 | 10 |
| Value | Count | Frequency (%) |
| 81 | 4 | < 0.1% |
| 80 | 2 | < 0.1% |
| 76 | 8 | |
| 75 | 4 | < 0.1% |
| 73 | 8 | |
| 71 | 8 | |
| 70 | 13 | |
| 69 | 2 | < 0.1% |
| 67 | 2 | < 0.1% |
| 66 | 4 | < 0.1% |
ACCOUNT_AGE_MONTHS
Real number (ℝ)
HIGH CORRELATION  MISSING 
| Distinct | 49 |
|---|---|
| Distinct (%) | 18.9% |
| Missing | 49570 |
| Missing (%) | 99.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 36.752896 |
| Minimum | 8 |
|---|---|
| Maximum | 91 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 389.4 KiB |
Quantile statistics
| Minimum | 8 |
|---|---|
| 5-th percentile | 9 |
| Q1 | 22 |
| median | 32 |
| Q3 | 51.5 |
| 95-th percentile | 74 |
| Maximum | 91 |
| Range | 83 |
| Interquartile range (IQR) | 29.5 |
Descriptive statistics
| Standard deviation | 19.712683 |
|---|---|
| Coefficient of variation (CV) | 0.53635727 |
| Kurtosis | -0.24103239 |
| Mean | 36.752896 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 0.63860915 |
| Sum | 9519 |
| Variance | 388.58987 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 32 | 16 | < 0.1% |
| 29 | 14 | < 0.1% |
| 25 | 12 | < 0.1% |
| 53 | 11 | < 0.1% |
| 30 | 10 | < 0.1% |
| 55 | 10 | < 0.1% |
| 43 | 10 | < 0.1% |
| 51 | 8 | < 0.1% |
| 39 | 8 | < 0.1% |
| 33 | 8 | < 0.1% |
| Other values (39) | 152 | 0.3% |
| (Missing) | 49570 |
| Value | Count | Frequency (%) |
| 8 | 8 | |
| 9 | 8 | |
| 10 | 6 | |
| 11 | 6 | |
| 13 | 2 | < 0.1% |
| 16 | 6 | |
| 17 | 6 | |
| 18 | 6 | |
| 19 | 6 | |
| 20 | 2 | < 0.1% |
| Value | Count | Frequency (%) |
| 91 | 2 | < 0.1% |
| 90 | 2 | < 0.1% |
| 80 | 4 | |
| 77 | 2 | < 0.1% |
| 74 | 6 | |
| 72 | 4 | |
| 71 | 4 | |
| 68 | 2 | < 0.1% |
| 64 | 4 | |
| 62 | 2 | < 0.1% |
| ACCOUNT_AGE_MONTHS | AGE | BARCODE | FINAL_QUANTITY | FINAL_SALE | GENDER | LANGUAGE | STATE | |
|---|---|---|---|---|---|---|---|---|
| ACCOUNT_AGE_MONTHS | 1.000 | 0.082 | 0.109 | 0.016 | 0.047 | 0.263 | 0.115 | 0.502 |
| AGE | 0.082 | 1.000 | -0.013 | -0.005 | -0.142 | 0.307 | 0.264 | 0.521 |
| BARCODE | 0.109 | -0.013 | 1.000 | -0.005 | 0.055 | 0.709 | 0.743 | 0.709 |
| FINAL_QUANTITY | 0.016 | -0.005 | -0.005 | 1.000 | 0.021 | 0.000 | 0.186 | 0.319 |
| FINAL_SALE | 0.047 | -0.142 | 0.055 | 0.021 | 1.000 | 0.000 | 0.000 | 0.000 |
| GENDER | 0.263 | 0.307 | 0.709 | 0.000 | 0.000 | 1.000 | 0.042 | 0.523 |
| LANGUAGE | 0.115 | 0.264 | 0.743 | 0.186 | 0.000 | 0.042 | 1.000 | 0.675 |
| STATE | 0.502 | 0.521 | 0.709 | 0.319 | 0.000 | 0.523 | 0.675 | 1.000 |
| RECEIPT_ID | PURCHASE_DATE | SCAN_DATE | STORE_NAME | USER_ID | BARCODE | FINAL_QUANTITY | FINAL_SALE | CREATED_DATE | BIRTH_DATE | STATE | LANGUAGE | GENDER | AGE | ACCOUNT_AGE_MONTHS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0000d256-4041-4a3e-adc4-5623fb6e0c99 | 2024-08-21 | 2024-08-21 14:19:06.539 Z | WALMART | 63b73a7f3d310dceeabd4758 | 15300014978.0 | 1.0 | NaN | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 1 | 0001455d-7a92-4a7b-a1d2-c747af1c8fd3 | 2024-07-20 | 2024-07-20 09:50:24.206 Z | ALDI | 62c08877baa38d1a1f6c211a | nan | 0.0 | 1.49 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 2 | 00017e0a-7851-42fb-bfab-0baa96e23586 | 2024-08-18 | 2024-08-19 15:38:56.813 Z | WALMART | 60842f207ac8b7729e472020 | 78742229751.0 | 1.0 | NaN | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 3 | 000239aa-3478-453d-801e-66a82e39c8af | 2024-06-18 | 2024-06-19 11:03:37.468 Z | FOOD LION | 63fcd7cea4f8442c3386b589 | 783399746536.0 | 0.0 | 3.49 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 4 | 00026b4c-dfe8-49dd-b026-4c2f0fd5c6a1 | 2024-07-04 | 2024-07-05 15:56:43.549 Z | RANDALLS | 6193231ae9b3d75037b0f928 | 47900501183.0 | 1.0 | NaN | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 5 | 0002d8cd-1701-4cdd-a524-b70402e2dbc0 | 2024-06-24 | 2024-06-24 19:44:54.247 Z | WALMART | 5dcc6c510040a012b8e76924 | 681131411295.0 | 0.0 | 1.46 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 6 | 000550b2-1480-4c07-950f-ff601f242152 | 2024-07-06 | 2024-07-06 19:27:48.586 Z | WALMART | 5f850bc9cf9431165f3ac175 | 49200905548.0 | 1.0 | NaN | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 7 | 00096c49-8b04-42f9-88ce-941c5e06c4a7 | 2024-08-19 | 2024-08-21 17:35:21.902 Z | TARGET | 6144f4f1f3ef696919f54b5c | 78300069942.0 | 0.0 | 3.59 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 8 | 000e1d35-15e5-46c6-b6b3-33653ed3d27e | 2024-08-13 | 2024-08-13 18:21:07.931 Z | WALMART | 61a6d926f998e47aad33db66 | 52000011227.0 | 1.0 | NaN | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 9 | 0010d87d-1ad2-4e5e-9a25-cec736919d15 | 2024-08-04 | 2024-08-04 18:01:47.787 Z | ALDI | 66686fc2e04f743a096ea808 | nan | 0.0 | 2.29 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| RECEIPT_ID | PURCHASE_DATE | SCAN_DATE | STORE_NAME | USER_ID | BARCODE | FINAL_QUANTITY | FINAL_SALE | CREATED_DATE | BIRTH_DATE | STATE | LANGUAGE | GENDER | AGE | ACCOUNT_AGE_MONTHS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 49819 | 441b9ecd-38ed-4960-9780-eb44a464284a | 2024-06-26 | 2024-07-02 09:37:07.656 Z | FRY'S FOOD STORE | 6251c788e3d6762c55855c1d | 72250021081.0 | 1.0 | 2.49 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49820 | 840c30ae-bc0a-40a4-a47d-052ed0af2da2 | 2024-08-18 | 2024-08-18 14:44:02.530 Z | COSTCO | 65b322787050d0a6206b3479 | 14074349.0 | 1.0 | 11.99 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49821 | 68f74fb3-ccf2-41f3-896a-799eb9a80680 | 2024-08-13 | 2024-08-19 11:06:59.023 Z | PEPPERIDGE FARM | 64f4aee2b84ba41db3fb246a | 14100071198.0 | 1.0 | 2.89 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49822 | f6d3e61d-488d-448b-8148-8d681e55b3d2 | 2024-09-01 | 2024-09-06 08:03:54.617 Z | TARGET | 61056fcc1efef449f0f39f7c | 85239042663.0 | 1.0 | 3.46 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49823 | 6cdf3c1a-78b3-4fb0-85fd-52e2f5b4731c | 2024-06-26 | 2024-07-01 11:00:39.769 Z | HARRIS TEETER | 5de7ec93ca63cc17893cdd14 | nan | 1.0 | 3.00 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49824 | b5cd61a9-8033-4913-a5c4-fb3f65e3a321 | 2024-08-21 | 2024-08-31 14:13:08.634 Z | TARGET | 6154bcf098f885648de2f299 | 85239110669.0 | 2.0 | 1.18 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49825 | e1b2f634-c9ad-4152-b662-4b22efc25862 | 2024-08-11 | 2024-08-11 18:15:56.736 Z | STOP & SHOP | 60aa809f188b926b2244c974 | 46100400555.0 | 1.0 | 2.00 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49826 | b07ef8dd-e444-40a2-819b-f74a3e5f1ae7 | 2024-07-11 | 2024-07-11 08:03:25.816 Z | WALMART | 60bd26e83dc3b13a15c5f4e7 | 646630019670.0 | 1.0 | 20.96 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49827 | 42475141-bef4-4df2-aa37-72577e2512bb | 2024-06-18 | 2024-06-18 19:57:32.211 Z | MARKET BASKET | 6169912fac47744405af62b7 | 41800501519.0 | 1.0 | 3.00 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |
| 49828 | 3a179c4e-46f2-4126-b3d2-3514afc23a3e | 2024-08-07 | 2024-08-07 15:30:07.911 Z | WALMART | 64e94d64ca929250373ef6e1 | 307660745853.0 | 1.0 | 5.48 | NaT | NaT | NaN | NaN | NaN | NaN | NaN |